HARDFS: hardening HDFS with selective and lightweight versioning
نویسندگان
چکیده
We harden the Hadoop Distributed File System (HDFS) against fail-silent (non fail-stop) behaviors that result from memory corruption and software bugs using a new approach: selective and lightweight versioning (SLEEVE). With this approach, actions performed by important subsystems of HDFS (e.g., namespace management) are checked by a second implementation of the subsystem that uses lightweight, approximate data structures. We show that HARDFS detects and recovers from a wide range of fail-silent behaviors caused by random bit flips, targeted corruptions, and real software bugs. In particular, HARDFS handles 90% of the fail-silent faults that result from random memory corruption and correctly detects and recovers from 100% of 78 targeted corruptions and 5 real-world bugs. Moreover, it recovers orders of magnitude faster than full reboot by using micro-recovery. The extra protection in HARDFS incurs minimal performance and space overheads.
منابع مشابه
Mosquito: Another One Bites the Data Upload STream
Mosquito is a lightweight and adaptive physical design framework for Hadoop. Mosquito connects to existing data pipelines in Hadoop MapReduce and/or HDFS, observes the data, and creates better physical designs, i.e. indexes, as a byproduct. Our approach is minimally invasive, yet it allows users and developers to easily improve the runtime of Hadoop. We present three important use cases: first,...
متن کاملModel-Driven Development of Versioning Systems: An Evaluation of Different Approaches
This paper analyzes the domain of versioning systems and compares three approaches to generating such systems from models. In the first approach, we define a domainspecific modeling language as a lightweight extension of UML and use templates to generate a middleware-based versioning system. In the second approach, we define a domain-specific data definition and manipulation language that can b...
متن کاملPeriod of Grace: A New Paradigm for Efficient Soft Error Hardening
In late-age silicon, soft errors become an issue even for low-margin products. Since classical hardening techniques are associated with costs which may not be acceptable for such ICs, selective hardening which targets only a subset of all possible soft errors has been suggested. We propose a soft error selection method based on severity of an error’s impact on system behavior. Some soft errors ...
متن کاملLightweight Fault Tolerance in Large-Scale Distributed Graph Processing
The success of Google’s Pregel framework in distributed graph processing has inspired a surging interest in developing Pregel-like platforms featuring a user-friendly “think like a vertex” programming model. Existing Pregel-like systems support a fault tolerance mechanism called checkpointing, which periodically saves computation states as checkpoints to HDFS, so that when a failure happens, co...
متن کاملA Versatile and User-Oriented Versioning File System
File versioning is a useful technique for recording a history of changes. Applications of versioning include backups and disaster recovery, as well as monitoring intruders’ activities. Alas, modern systems do not include an automatic and easy-to-use file versioning system. Existing backup solutions are slow and inflexible for users. Even worse, they often lack backups for the most recent day’s ...
متن کامل